{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e0c2f11f",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/MyScaleIndexDemo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "307804a3-c02b-4a57-ac0d-172c30ddc851",
   "metadata": {},
   "source": [
    "# MyScale Vector Store\n",
    "In this notebook we are going to show a quick demo of using the MyScaleVectorStore."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c12f55a9",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c1edec46",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7010b1d-d1bb-4f08-9309-a328bb4ea396",
   "metadata": {},
   "source": [
    "#### Creating a MyScale Client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d48af8e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "import sys\n",
    "\n",
    "logging.basicConfig(stream=sys.stdout, level=logging.INFO)\n",
    "logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "50ad978c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from os import environ\n",
    "import clickhouse_connect\n",
    "\n",
    "environ[\"OPENAI_API_KEY\"] = \"sk-*\"\n",
    "\n",
    "# initialize client\n",
    "client = clickhouse_connect.get_client(\n",
    "    host=\"YOUR_CLUSTER_HOST\",\n",
    "    port=8443,\n",
    "    username=\"YOUR_USERNAME\",\n",
    "    password=\"YOUR_CLUSTER_PASSWORD\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ee4473a-094f-4d0a-a825-e1213db07240",
   "metadata": {},
   "source": [
    "#### Load documents, build and store the VectorStoreIndex with MyScaleVectorStore\n",
    "\n",
    "Here we will use a set of Paul Graham essays to provide the text to turn into embeddings, store in a ``MyScaleVectorStore`` and query to find context for our LLM QnA loop."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a2bcc07",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import VectorStoreIndex, SimpleDirectoryReader\n",
    "from llama_index.vector_stores import MyScaleVectorStore\n",
    "from IPython.display import Markdown, display"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68cbd239-880e-41a3-98d8-dbb3fab55431",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Document ID: a5f2737c-ed18-4e5d-ab9a-75955edb816d\n",
      "Number of Documents:  1\n"
     ]
    }
   ],
   "source": [
    "# load documents\n",
    "documents = SimpleDirectoryReader(\"../data/paul_graham\").load_data()\n",
    "print(\"Document ID:\", documents[0].doc_id)\n",
    "print(\"Number of Documents: \", len(documents))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b6afe88c",
   "metadata": {},
   "source": [
    "Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0d09a78f",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fe3dc84",
   "metadata": {},
   "source": [
    "You can process your files individually using [SimpleDirectoryReader](/examples/data_connectors/simple_directory_reader.ipynb):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4febd54a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "../data/paul_graham/paul_graham_essay.txt\n"
     ]
    }
   ],
   "source": [
    "loader = SimpleDirectoryReader(\"./data/paul_graham/\")\n",
    "documents = loader.load_data()\n",
    "for file in loader.input_files:\n",
    "    print(file)\n",
    "    # Here is where you would do any preprocessing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba1558b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# initialize with metadata filter and store indexes\n",
    "from llama_index.storage.storage_context import StorageContext\n",
    "\n",
    "for document in documents:\n",
    "    document.metadata = {\"user_id\": \"123\", \"favorite_color\": \"blue\"}\n",
    "vector_store = MyScaleVectorStore(myscale_client=client)\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, storage_context=storage_context\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "04304299-fc3e-40a0-8600-f50c3292767e",
   "metadata": {},
   "source": [
    "#### Query Index\n",
    "\n",
    "Now MyScale vector store supports filter search and hybrid search\n",
    "\n",
    "You can learn more about [query_engine](/module_guides/deploying/query_engine/root.md) and [retriever](/module_guides/querying/retriever/root.md)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "35369eda",
   "metadata": {},
   "outputs": [],
   "source": [
    "import textwrap\n",
    "\n",
    "from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters\n",
    "\n",
    "# set Logging to DEBUG for more detailed outputs\n",
    "query_engine = index.as_query_engine(\n",
    "    filters=MetadataFilters(\n",
    "        filters=[\n",
    "            ExactMatchFilter(key=\"user_id\", value=\"123\"),\n",
    "        ]\n",
    "    ),\n",
    "    similarity_top_k=2,\n",
    "    vector_store_query_mode=\"hybrid\",\n",
    ")\n",
    "response = query_engine.query(\"What did the author learn?\")\n",
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a732d16f0a29f8ab",
   "metadata": {},
   "source": [
    "#### Clear All Indexes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "552c203fd054d771",
   "metadata": {},
   "outputs": [],
   "source": [
    "for document in documents:\n",
    "    index.delete_ref_doc(document.doc_id)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
